home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The Original Shareware 1.1
/
The Original Shareware (WeMake CDs)(Volume 1.1)(CDs, Inc)(1993).iso
/
32
/
leastq.zip
/
LEASTQ.DOC
< prev
next >
Wrap
Text File
|
1990-02-21
|
17KB
|
270 lines
LEASTQ is a general purpose curve fitting program designed to find the
coefficients a[i] of a function, y(x), of the form:
(1) y(x) = a[0] + a[1]*f1(x) + a[2]*f2(x) + . . . .
where fi(x) are some functions of x. Data for this program consists of a
set of points, each one described by three numbers, x[k], y[k] and err[k],
where y[k] is an experimental value measured at x[k] and err[k] is the
uncertainty in this measurement. The program finds the coefficients a[i]
that minimize, Chisq, the sum of the squares of the deviations of the
calculated y from the measured y.
(2) Chisq = Sum(k = 1;Npts) of (y[k] -ycalc(x[k]))^2/err[k]^2
where ycalc(x[k]) is y calculated from x[k] with formula (1) above. The
procedure of minimizing Chisq is called least squares fitting.
To call this program, type LEASTQ [MYFILE.DAT] [/E] <CR> from the command
line. The parameters in square brackets are optional. MYFILE.DAT is the
generic name of the file containing the data you want to fit; if this
parameter is omitted, the opening screen of the program will show you a
list of all the files in your current directory with the extension DAT
and you can select one of these files by choosing its number. The '/E'
parameter is used to force the program to use EGA rather than VGA graphics.
Otherwise the program will use VGA graphics if it detects that your machine
has VGA available. The reason for this option is that you may want to run
LEASTQ with a graphics screen dump TSR loaded. Some of these, such as
SCRNDU, do not handle VGA graphics properly.
LEASTQ assumes that any file with extension 'DAT' is a data file for this
program. If you have other files with this extension on the disk where you
keep LEASTQ, you may want to consider moving them to another disk or
directory so that you don't call them by mistake.
A data file for this program consists of three columns of numbers. The
first column contains the values of x, the second column the correspond-
ing values of y, and the third column the errror in y. Each line in the
data file, therefore, corresponds to a data point. The proper format can be
ascertained from the sample data files that have been provided. Spacing of
the columns is not critical and comments placed after the third column are
ignored by the program. You can generate a data file with an editor or word
processor, or a data file could be generated by an application program. You
can also generate a data file from LEASTQ. Start the program without
specifying a data file in the command line and choose 'none of the above'
from the menu of .DAT files in your directory. You will be put into an
editor that will allow you to enter data from the keyboard which you can
subsequently save to the disk if you wish.
When you enter the keyboard routine, you will be asked if your data has
evenly spaced values of x; if so, enter the first x and the step
between the x's. The program will then fill in the values of x for you.
This can save a lot of time. If you enter the values of x yourself, you can
put your data in any order; LEASTQ will sort your data in order of
ascending x when you are finished. It is perfectly acceptable for two
points to have the same x coordinate; this sometimes happens with real data
when you go back and repeat a measurement.
You will then be asked if the dependent variable, y, is a count of
something: the number of beans in a jar, for instance. In that case, the
program will calculate the appropriate value of the error, which is the
square root of y. If you define the data as a count, you will not be
allowed to enter negative values of y. If you want all data points to have
the same error, enter the first error and then just <cr> for subsequent
entries; the error for the current point will be taken from the previous
point. All data points MUST have a finite error; any data point without an
error will be ignored by the program. This is true, also, of data read from
a file; points with zero error will be ignored.
You can use the cursor keys to go back and correct data that was
entered incorrectly.
Hit <esc> to exit the data entry routine and return to the main program.
You must have at least 3 valid data points to run this program and you are
allowed a maximum of 1024 data points. You will exit automatically if you
exceed this number. The limit applies also to data read from a file. if the
program is reading data from a file with more than 1024 entries, it will
exit after the 1024th data point.
It is difficult to enter a large number of data points without making a
few mistakes - forgetting a decimal point, for instance. When you have
finished entering data from the keyboard, you will be shown a plot of your
data. If you see some obvious mistakes, press <N> at the question 'Data
OK?' . Count the points from the left to find the point you want to change;
the first point on the left is #1. Enter the number of the point; it will be
highlighted in blinking white and you will be asked to confirm that it is
the point you want to change. You will be asked to replace x, y and err for
the point you have chosen. Entering <CR> leaves the quantity unchanged. If
there are additional points you would like to correct. press <Y> at the
question 'more'; press <N> when you are finished correcting your data. Then
you will be given the option of storing your data as a file for later
use. Note that you do not give your file name the extension 'DAT'; the
program will do that for you. A data file will be created in your current
directory with the name you have given.
After you have loaded your data from a file or from the keyboard, you
are shown a menu of functions that can be used to fit the data. A library
of commonly used functions has been provided. Select the function you wish
to use by typing the highlighted letter. The appropriate function is often
determined by the differential equation you want to solve and the boundary
conditions. In cases where the choice of function is open, it is sometimes
useful to look at a plot of the data before selecting the species of
function to use. Selecting 'V' (for 'View') from the menu will display a
plot of the data together with the list of functions, and your choice can
be made while viewing this plot; otherwise, hitting any key will return to
the select functions menu.
After you have selected your function, you will be asked to supply a
range over which x is to vary. For example, if your data consists of the
Dow Jones Industrial Average, y, versus the year, x, over the range 1948 to
1988, you would probably want x to run over a range from 0 to 40 or perhaps
0 to 1, rather than 1948 to 1988, so that the coefficients, a[i] will have
a reasonable size. Note that if you choose Powers and a range of 0 to 1.0,
all coefficients will have the same weight in the calculated value of y at
the maximum x; you can see at a glance if the a[i] are converging to 0 as i
increases. For trigonometric functions, you will be asked the number of
cycles that the range of x is to represent. Eliot wave theorists can play
around with those Dow Jones averages. For Chebychev and Legendre
polynomials the range of x should be -1.0 to +1.0 or smaller. For Bessel
functions the lower limit must be greater or equal 0, and the upper limit
should be chosen no greater than 25 because of the limitations in the
algorithm used in this program to calculate Bessel functions.
Next you will be asked the number of coefficients in the fit. The
minimum allowable number is 2, the largest number is 12 with the restriction
that the number of coefficients must be less than the number of data
points. After each calculation the program will display a plot of the
data fitted to the curve calculated by equation (1), the Chisq of the
fit as calculated from equation (2) (the lower the Chisq, the better
the fit) and ask you 'accept fit?'. If you think more coefficients are
needed, answer 'N'. It is good practice to start with a small number of
coefficients and increase this number until adding another coefficient
produces only a small decrease in Chisq. The algorithm for calculating
the a[i] involves a matrix inversion. With a large number of
coefficient this inversion sometimes bombs; the bomb will place you
back in the menu at the place where you choose the number of
coefficients and you will just have to settle for a smaller number of
coefficients. Sometimes increasing the range of x will permit you to
use more coefficients.
In two special cases, function species 'Exponen' and 'Gaussian', you are
allowed two and three coefficients respectively; the step of choosing the
number of coefficients is bypassed.
If you answer 'y' to accept the fit, the program will list the
coefficients, a[i], and a comparison of experimental y[k] with calculated
y's for all data points, the chisq of your fit and a quantity called the
confidence level, which will be discussed below. The program will then
present you with a menu of options. Choose 'D' to try a new data file.
Choose 'R' to try a different range in x. Choose 'S' if you want to try a
different species of function. For instance, you may have fit the data to
Sines and now notice that the coefficients of sin(2*x), sin(4*x) etc. are
very small and have large errors; you might then try fitting with Oddsines.
The most general fit to trigonometric functions is called Fourier; you can
start with this and later choose the species of trigonometric function that
gives the best fit to your data.
Sometimes it becomes evident that your fit is not working out; the
Chisq remains large or the matrix inversion bombs when you try to use an
adequate number of coefficients. In this case, it is best to pretend to
'buy' the solution so as to get into a menu where you can pick a different
species of function or a different range in x. The characteristics of a
good fit are that the coefficients are reasonably small and converging:
that is a[i+1] > a[i] and that the errors are small, at least for the first
few terms.
Finally, if your inability to get a good fit appears to be the result of
just one or two bad points, you can edit the data so as to bring the
troublemakers in line. Choose option 'F' for 'Fudge data'. You will get a
plot of your data together with your most recent fit from which you can
adjust points with the same procedure used to correct bad points after
entering them from the keyboard. Points which have been fudged are
henceforth plotted in green, whereas the original unchanged data points are
plotted in red. You realize, of course, that using this procedure with
actual experimental data is highly unethical, so you will have to answer to
your conscience if you use this option. Because LEASTQ does not want to
become your accomplice in crime, it will not allow you to save your
shamelessly doctored data to disk.
Choose option 'Q' to quit the program. The program will ask you if you want
a printed output of your results. Hit 'Y' to get a printed record of your
fit togother with a comparison of your data with the calculated values.
Some data files have been provided for demonstration purposes:
SCURVE Try fitting to Powers. Note that Chisq does not decrease much
after 4 coefficients. Also try Bessels with a range of 0 to 1.0.
SQUAREWV Try Sines with range = 1.0 cycles. Then, since the even values
obviously aren't pulling their weight in the boat, try Oddsines.
Note that this data does not represent a true square wave; the
sides have a finite slope. A good fit to a true square wave
requires more terms than this program allows.
SAWTOOTH Try Sines with range = 1.0 cycles. Note that with 8 coefficients
the stupid computer thinks that it has done a fantastic job of
fitting the data. You, the clever human being, have to set it
straight. Six coefficients is about the best fit.
TRAPEZD Try Powers with a range from 0 to 1. Note that it takes many
coefficients to get a good fit and that the coefficients, a[i],
are very big and have big errors. Powerseries is not a good choice for
extrapolation; this expression will explode outside the calculated
range. Then try Oddsines with range = 0.5 cycles.
EXPONEN Obviously, the first function to try is Exponen. Set the range from
0 to 60 so that you get the real decay length as coefficient a[1].
You will get a logarithmic plot which gives you a good view of the
points with small y but a poor view of the points with large y.
Then try Powers which will give you an arithmetic plot and a good
view of the big points. Powers with 6 coefficients gives a good fit.
GAUSSIAN: Try Gauss, another special case. The log of the data is fit to a
parabola with only two coefficients allowed and the plot is
logarithmic. Then try Oddsines with range = 0.5 cycles to see an
arithmetic plot.
THERMOCP Data for this were taken from the Handbook of Chemistry and Physics
with very small errors assigned. Try Powers with a range of x from
0 to 27.39. (Since you intend to use the calculated coefficients in a
programable pocket calculator, you use the real range of x so you
can enter the coefficients directly.) The plot is not much use
here; the errors are too small to see. Keep an eye on Chisq and
keep adding terms until a decent fit is obtained. 6 coefficients
are sufficient to give < 0.1 degree accuracy over the range of x.
The first coefficient, a[0], is negligible and should be thrown
away. Try also Tscheby (Chebychev Polynomials) with range 0 to 1.
For most of these data files the errors were chosen arbitrarily; thus,
the absolute value of chisq has no particular significance and is merely a
relative indicator of goodness of fit. For data from actual experimental
measurements where a reliable estimate of the uncertainty is available, the
value of Chisq is significant. A rough rule of thumb is that chisq should be
somewhat smaller than the number of Degrees of Freedom. A more precise
evaluation of the significance of the Chisq obtained is the Confidence
Level which is calculated every time you 'buy' a fit. The confidence level
is a number that varies between 0 and 1. A perfect fit corresponds to a
confidence level of 1.000. An extremely poor fit is described by a
confidence level of 0.000. In practice, any confidence level over 0.8 is
pretty good fit; with a confidence level under 0.5 you had better think
about framing a new hypothesis, i.e., choosing a different species of
function or range in x. When choosing between two species of functions, the
confidence level rather than the value of chisq is the best indicator of
goodness of fit.
For example, consider once again the data file GAUSSIAN, which was
created with a random number generator and statistical errors. Using
function species Gaussian you will get a chisq of 23.839 for 31 degrees of
freedom giving a confidence level of 0.8171. Now try species Oddsines with
12 coefficients. You will get a chisq of 19.31, which looks like a better
fit, but with only 22 degrees of freedom your confidence level is 0.6259.
What is happening is that with 12 coefficients to play around with LEASTQ
can get cute and adjust to the noise in the data. Oddsines with only 6
coefficients is a better fit giving a confidence level of 0.7533.
While the higher the confidence level, the better the fit, you should be
suspicious of experimental data with too high a confidence level; it may
indicate that some unscrupulous person has Fudged the data. Try fudging
GAUSSIAN. With just a few changed points you can get a CL of 0.998.
***************************************
This program is free and may be distributed by bulletin boards and archiving
services at their usual rates. Report bugs and suggestions for improvement
to the author:
John D. Fox
309 NW 24th Street
Gainesville, FL 32607
CIS: 71270,2304